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Abstract —Scheduling Bag-of-Tasks (BoT) applications on the 
cloud can be more challenging than grid and cluster environ¬ 
ments. This is because a user may have a budgetary constraint 
or a deadline for executing the BoT application in order to 
keep the overall execution costs low. The research in this paper 
is motivated to investigate task scheduling on the cloud, given 
two hard constraints based on a user-delined budget and a 
deadline. A heuristic algorithm is proposed and implemented 
to satisfy the hard constraints for executing the BoT application 
in a cost effective manner. The proposed algorithm is evaluated 
using four scenarios that are based on the trade-off between 
performance and the cost of using different cloud resource 
types. The experimental evaluation confirms the feasibility of 
the algorithm in satisfying the constraints. The key observation 
is that multiple resource types can be a better alternative to using 
a single type of resource. 

I. Introduction 

Bag-of-Tasks (BoT) is the term given to a collection of 
independent and identical tasks, which can be executed in any 
order. BoT applications are a common way of breaking up 
a complex problem into smaller independent tasks in both 
scientific and industrial communities. BoT applications are 
normally executed on distributed environments in order to 
achieve high degrees of parallelism. BIONIC HI is an example 
of a BoT framework, it assigns tasks to volunteer computing 
resources and is used in more than 80 different projects 
ranging from astronomy to the physical sciences'. 

BoT applications are commonly executed on Grid and 
Cluster systems. Both environments consist of multiple in¬ 
terconnected machines, which are already running and shared 
between different organisations (Grid environment), or groups 
in the same organisation (cluster environment). 

Cloud computing mi is considered as a more accessible 
alternative as it offers resources, which a user can acquire on- 
demand through a pay-per-use model. However, in contrast to 
other forms of distributed computing environments, a cloud 
user has to decide which resources (instance types etc.) and 
how many resources need to be acquired before actually using 
and paying for them. A user cannot therefore greedily acquire 
as many resources as possible before deciding which ones are 
suitable for her application. Moreover, as cloud resources are 
pay-per-use the cost of the execution has to be taken into 
account. As a result, there is a trade-off between performance 
and cost: in order to have better performance, i.e. lower 


execution time, the cost has to be increased. Nevertheless, it is 
challenging to balance the trade-off so a that user can achieve 
the best performance with the lowest cost. 

This paper explores executing BoT applications on the cloud 
with user defined hard constraints. Hard constraints are defined 
as conditions that always need to be satisfied. For example, 
consider the following two hard constraints: one user might 
want to keep the cost of execution within a certain budget 
constraint, while another user would prefer that the application 
execution must be finished within a given time frame or 
deadline constraint. 

In this paper, we aim to optimise the execution of BoT 
applications on the cloud. We investigate two scenarios in 
which a user provides a hard constraint in the form of a budget 
(the maximum amount of money that a user can spend) or a 
deadline (the maximum amount of time that an execution can 
take). If the budget constraint is given, our approach not only 
satisfies the constraint, but also minimises the execution time. 
Similarly, if the deadline constraint is provided, our approach 
aims to also minimise the total cost for executing the BoT. 

The contributions of this paper are as follows: i) the mathe¬ 
matical model for scheduling tasks on the cloud with a given 
hard constraint, ii) the heuristic algorithm for cost effective 
scheduling, and iii) the evaluation considering different trade¬ 
off between multiple options provided by Cloud providers. 

The remainder of this paper is organised as follows. Section 
[n| presents a mathematical model of the platform and the 
problem. Section[nI|proposes a heuristic algorithm considering 
the hard constraints for executing tasks on the cloud. Section 
[Tv] describes the scheduling of tasks. Section |V] presents an 
evaluation of the heuristic algorithm on four possible scenar¬ 
ios. Section VI highlights the work related to the research 
reported in this paper. Section VII concludes this paper. 


H. Mathematical Models 

In this section, mathematical models that represent the cloud 
platform and the problem of executing the BoT on the platform 
given the hard constraints are considered. 


A. Platform Model 

Let IT = {iti...itM} denote the list of M types of cloud 
instance (for example, public cloud providers such as Amazon 
provide a variety of instance types^). Each instance type it G 
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IT can be characterised by two properties: (i) cost per hour 
Cit, which is the amount spent for using a Virtual Machine 
(VM) of an instance type in an hour, and (ii) performance pn, 
which is the time taken to execute a task in seconds. Assume 
T = is the list of N tasks. 

As the goal is to create VMs of different instance types 
and assign tasks to each VM, let VM = {vmi...} be the 
execution plan containing the collection of VMs. For a given 
vm G VM, the instance type is ityrn G IT and the tasks 
assigned on it are C T. A VM can be represented as 
vm = {itvrn,Tyrn), which is a pair of the instance type and 
the tasks assigned to it. It should be noted that the upper bound 
of VM, the instance type, and the collection of tasks assigned 
to each VM are unknown and needs to be determined. 

Normally, some amount of time is required to boot a VM 
into a usable state. If st be this time for each VM regardless 
of its instance type, then the execution time of a VM is: 

eXeCyryi = ^ ^ Pvm ~\~ st = \TyYn\^Pv'm. “t“ Sf (1) 

Cloud VMs, for example, from public providers, such as 
Amazon, are charged by the hour (3600 seconds). A user pays 
for an hour even if only a few seconds of the hour are utilised. 
The cost of running a VM is: 

COSt,,rri = - X dt (2) 

I 3gQQ I 

Since all VMs are running simultaneously, the overall 
execution time is the execution time of the slowest VM: 

exec = max execym (3) 

vm^V M 

The total cost of executing all tasks is the sum of costs of 
each VM represented as: 

cost = costym (4) 

vm£V M 

B. Problem Model 

In this paper, two hard constraints are considered. The first 
referred to as a ‘budget constraint’ is the maximum amount 
of money that a user is willing to spend for executing a BoT 
on the cloud. The second is a ‘time constraint’, which is 
the maximum time that can be allowed for completing the 
execution of the BoT. 

The problem of executing a BoT on the cloud given a budget 
constraint B is to minimise the overall execution time while 
keeping the cost less than or equal to the budget at the same 
time. This is represented as follows: 

minimise exec 

(5) 

such that cost < B 


C. Accounting for Throughput 

In order to find an optimal execution plan based on the given 
hard constraints, the number of VMs for each instance type 
and the assignment for each tasks must be decided. This is a 
hard problem since the number of tasks in an application are 
large. In this section, we simplify the problem by modelling 
it using throughput, which is the number of tasks that can be 
executed during a fixed period of time on a VM. 

For any instance type it G IT, as considered above, pa is 
the time in seconds required to execute a task. The number of 
tasks executed per second is and the throughput of a VM 
in one hour is: 


thir = 

Pit 

(7) 

The floor function is applied since a task has 
executed by a VM. 

to be fully 

The total throughput of all VMs of an instance type it in 

one hour is: 


TH,t = \VMu\xth^r 

(8) 

and the total throughput of all VMs is: 


TH= THu 

(9) 


itG/T 


Performance on the cloud is maximised when TH is max¬ 
imised. 

The cost of running one VM for two hours is the same 
as the cost of running two VMs for one hour. Hence, it is 
assumed that all VMs run for no more than one hour, and the 
total cost is calculated as: 

cost — E \VM,t\xcu (10) 

itelT 

Given the budget constraint B, the problem of executing a 
BoT on the cloud is modelled as: 

maximise TH 

( 11 ) 

such that cost < B 

If the time constraint D is given, the number of tasks that 
can be executed by an instance type it within the deadline is: 

thu = [- —( 12 ) 

Pit 

Given the time constraint D, the total throughput is: 

(13) 

Then, taking throughput into account, the problem model is: 


Similarly, the problem of executing a BoT on the cloud 
given a time constraint D is to minimise the total cost while 
keeping the overall execution time less than or equal to the 
deadline. This is represented as follows: 

minimise cost 


such that exec < D 


minimise cost 
such that TH° > N 


(14) 


In comparison to solving Equation (or Equation |^, 
Equation 11 (or Equation [T^ is less complex and can be 
solved easily since it depends on the instance types rather 
than the tasks. 


( 6 ) 






III. Algorithms 


In this section, we propose a heuristic algorithm to find 
an execution plan based on budget and time constraints. 
As a starting point, a baseline solution is considered that 
provides a list of VMs of a single instance type and satisfies 
the constraints. Then, the solution is optimised to either (i) 
increase the performance when the budget constraint is given, 
or (ii) reduce the cost when the budget constraint is provided. 
The proposed algorithm accepts a single constraint at a time 
and simultaneously considering multiple constraints will be 
investigated in the future. 


A. Select the Most Cost Effective Instance Type 

For an instance type it G IT, the number of VMs affordable 
under a budget constraint B is; 

\VMu\^= [—J (15) 

Cit 

As presented by Equation THu is the throughput per 
hour of one VM of it. Hence, the total throughput produced 
by it, based on cost constraint B is: 

THf^ =\VMufxTHu (16) 


Hence, for a given budget, the most cost effective instance 
type it^ is: 

(17) 

itGiT 


it^ = arg max T 


that results in the highest total throughput. 

The number of tasks that can be executed on an instance 
type under a time constraint is th^ (refer Equation 12 1 . So, 
the number of VMs required to satisfy the time constraint is; 




The cost for the VMs is: 

4-D 


costu = \VMit\^xcit 


(18) 


(19) 


The most cost effective instance type is: 

it^ = arg min cost^ (20) 

itelT 

B. Optimise Instance Type Selection Algorithm 

The execution plan by applying budget or cost constraints 
is generated using a single instance type. However, when 
multiple instance types are utilised in the execution plan, it 
is possible to obtain better performance and also reduce costs 
further. 

Algorithm[2optimises the execution plan by replacing VMs 
initially present in the execution plan with VMs of other 
instance types. This not only increases the throughput but also 
reduces the costs while satisfying the constraints. 

The inputs to Algorithm [T] are the list of instance types, the 
selected instance type ito (which is obtained from Equation 


explained shortly), the time constraint, the number of VMs 
to be replaced and a boolean flag indicating if the goal is 


17 or Equation [20)i, the remaining budget (which will be 


Algorithm 1 Optimise Instance Type Selection 
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function OPTIMISE(JT, itQ,B, D, num, rainCost) 
exec <— D — st 

if nurriito = 0 V numuf, < num then 
return 
end if 


(zf , th^, Tiumj.p^fi, numj.p2Yig^ ^ (^NULL, 0, 0, 0) 
for iti G {it G IT \it ^ it^} do 
if Citj^ > B num x cug then 


continue 
end if 

num' G- 0 
if Cit^ > B then 
num' G- num 

end if 

I BGnum' Xcun i 

num vms G- - 

L citj J 

if minCost = TRUE A numjums x cu^ = B + 
i! X dto then 

numjums G- numjums — 1 

end if 

th' th— X num'\ + x numjums\ 


20: if minCost = TRUE A th' < \T\ then 

21 : continue 

22 : else if th' < th then 

23: continue 

24: else if th' > thm V {th' = th^ A cu^ < cu') then 

25: {tt , th^, num^.p^d, numj.pijig'j ^ 

{iti, th', num', numjums) 

26: end if 

27: end for 

28: if it' = NULL then 

29: REPLACE{IT, ito, B, D, num + 1, minCost) 

30: else 

31: numuo G- numuo - numrped 

32: numit' G- numw + num^ping 

33: if minCost = TRUE then 

34: B ^ 0 

35: else 

36: B G- B+numrped'^CitQ—numrping UmeSCit' 


37: end if 

38: REPLACE{IT, ito, B, D, num, minCost) 

39: end if 

40: end function 













to minimise cost {TRUE for minimising cost and FALSE 
maximising throughput). 

The boolean flag determines the calculation of the remainder 
of the budget. If the goal is to maximise throughput, then the 
budget is the difference between the cost constraint initially 
provided and the current cost of execution. On the other hand, 
if the goal is to minimise cost, then no more VMs can be 
added since the budget is depleted. 

Algorithm [T] is recursive and in each iteration, either the 
performance is improved or the cost is reduced without 
violating the given constraint. The algorithm terminates when 
the number of VMs of Uq is either zero or less than the number 
of VMs to be replaced (Line [^. 

The algorithm firstly loops through all instance types except 
ito- Instance types that cannot be afforded within the budget 
are ignored (Line [^. The allowance to add new VM(s) is 
the sum of the remaining budget and the cost of VMs of ito 
which are allowed to be replaced (if the remaining budget is 
not enough, some VMs of Uq will have be removed to not 
exhaust the budget). 

Then the number of VMs to be replaced is calculated (Lines 
[T2]andpj]i. After that, based on the allowance, the number of 
replacing VM is calculated (Line[T5|). If the goal is to minimise 
cost and the cost of additional VMs is exactly equal to the 
allowance, the number of replacing VM has to be decreased 
by one so that its cost can be lower than the allowance (Line 

[H- 

Next, the resulting throughput is calculated by adding 
the additional throughput from the newly added VMs and 
the current throughput. VMs of ito removed, and their 
throughput is deducted (Line [T9| l. 

If the goal is to minimise the total cost, the new throughput 
must not be less than the total number of tasks as all tasks must 
be executed within the deadline (Line |20| ). On the other hand, 
if the goal is to maximise throughput, the new throughput 
cannot be less than the current throughput (Line [22| . 

All replacement instance types are compared; the instance 
type with the highest throughput and lowest cost is selected 
(Line[24|). Then, VMs of the new instance type are added and 
VMs of ito removed if required. This process is performed 
by changing the number of VMs of each instance type (Lines 

remaining budget is always zero (Line [3^ if the goal is to 
minimise cost. 

The algorithm continues execution with updated values 
(Line [3^ . However, if no replacing instance types are found, 
the number of VMs of type ito is increased by one (Line 
[ 29 ) 1 . Each iteration can result in the following; (i) increase 
the overall throughput, or (ii) decrease the total cost, or (iii) 
increase the number of VMs of ito to be replaced by one. 

IV. Assign Tasks to VMs 

The result of Algorithm is a list of VMs of different in¬ 
stance types. However, the VMs do not have any tasks assigned 
to them. Therefore, we propose an additional algorithm (refer 
to Algorithm ^ for assigning tasks to VMs. The algorithm 


31 and |32|l. The remaining budget is updated (Line the 


finds a VM in the list which can complete the execution of 
the task in the lowest time if a task were assigned to it. This 
ensures that the overall execution is as low as possible. 

For example, given two instance types whose performances 
are 5 and 8, and assuming there are two VMs of the first type 
and one VM of the second type, the execution time for each 
VM is 10, 12 and 9. If a task is added to each VM, their new 
execution will be 10 -f 5 = 15, 12 -f 5 = 17 and 9 -f 8 = 16. 
Hence, a new task should be added to the first VM, whose 
performance is 5 and current execution time is 10, so that the 
overall execution time after the assignment is lower than the 
other options. 


Algorithm 2 Assign Tasks to VMs 


function ASSIGN(r,VM) 
for f G T do 

vmo ■<— NULL 
exec -fr- 0 

for vm G VM do 

exec' ^ execym + Pit^m 
if vmo = null V exec' < exec then 
vmo vm 
exec •(— exed 

end if 
end for 

^ TyyfiQ n 

end for 
return VM 
end function 


V. Experimental Evaluation 

Our approach of scheduling tasks on the cloud based on a 
given budget or cost constraint is evaluated in this section. The 
evaluation considers four different scenarios which are based 
on the difference in cost and performance between different 
instance types. 

A. Performance Gain vs Cost Increase 

One important criteria for selecting a cloud instance is the 
trade-off between performance and cost. For example, how 
much quicker does a task execute when it is moved from one 
instance type to another with a higher cost. Given an instance 
type ito, the trade-off when employing a more expensive 
instance type iti, where cu^ > cug can be calculated as: 
toit, itn = EIsl j'Sh. (ratio of the change in performance and 
change in cost). 

There are three cases for the trade-off between performance 
and cost: 

• Fair trade-off {to — 1), when the performance gain is 
equal to the increase in cost. When the trade-off is fair, it 
does not make much difference between using expensive 
or cheaper instances. 

• Cost-effective trade-off {to > 1), when the performance 
gain is more than the monetary increase. It is profitable 
to use an expensive instance type. 







Instance Type 

Cost 

Performance 

iti 

1 

32 

it2 

2 

16 

its 

4 

8 

it4 

8 

4 

its 

16 

2 


TABLE I; Fair Trade-off 


Instance Type 

Cost 

Performance 

Hi 

1 

32 

its 

2 

18 

its 

4 

10 

it4 

8 

6 

its 

16 

4 


TABLE II; Cost-ineffective Trade-off 


Instance Type 

Cost 

Performance 

iti 

1 

32 

its 

2 

15 

its 

4 

7 

it4 

8 

3 

its 

16 

1 


TABLE III: Cost-effective Trade-off 


Instance Type 

Cost 

Performance 

M3 .Medium 

0.077 

87.37 

C3.Large 

0.12 

25.33 

M3 .Large 

0.154 

27.08 

C3.Xlarge 

0.239 

12.7 

M3.Xlarge 

0.308 

13.79 


TABLE IV; Mixed Trade-off 


• Cost-ineffective trade-off (to < 1), when the performance 
gain is less than the monetary increase. In this case, it is 
advisable to use cheap instance types. 

The trade-off is highly specific to a user’s application and 
instance type. For example, if an application can use only one 
CPU core, using expensive instance types with multiple cores 
may not be cost-effective. On the other hand, instance types 
with more cores, each of which has higher clock speed, can 
be beneficial to a parallel application. 

In order to effectively select the optimal execution plan for 
an application, the trade-off between performance gain and 
monetary increase must be taken into account. It is more 
beneficial to have many VMs of the cheap instance type if 
the trade-off is cost-ineffective. On the other hand, with the 
cost-effective trade-off, a user can opt for VMs of expensive 
instance types. 


B. Setup 

This section presents a comparison between using only 
one instance type or combining multiple instance types. We 
compare two approaches, the first one is simple and only uses 
the most cost-effective instance type selected by Equations [T^ 
orj^while the other applied Algorithm[2to use a combination 
of multiple instance types. 

Four different scenarios corresponding to the three trade¬ 
off cases and an additional mixed trade-off (cost-effective and 
cost-ineffective VM types were used) case were considered. 

For each scenario, we used 10 different values of budget 
and deadline constraints and an application with 1000 tasks. 
The instance start up time, i.e. st, is set to 10 seconds. 

1) Scenario 1 - Fair Trade-off: Table |I] shows that the 
performance and cost of an instance increases in the same 
ratio. For example, it 2 is two times more expensive than iti 
and the time it takes to execute an application is half of itI’s. 

2) Scenario 2 - Cost-ineffective Trade-off: Table shows 
that the performance gain is lower than the monetary increase. 
For example, tou^^it^ = 0.9 

3) Scenario 3 - Cost-effective Trade-off: Table III shows 
that the performance gain is more than the monetary increase. 
For example, tOit^^it 2 = 1-07. 


4) Scenario 4 - Mixed Trade-off: Table m shows in¬ 
stance types with cost-effective and cost-ineffective trade¬ 
off. We obtained the performances by executing the genome 
pattern searching application on five Amazon instance 
types. For example, toMs. Medium, C3. Large = 2.2 and 

tOC3.Large,M3.Large — 0.7. 

C. Results 

The four scenarios (each scenario took ten values for budget 
and cost constraints) considered above were simulated on a 
custom built simulation framework developed using Scala. 
The framework took as input the cost and performance of 
the instances considered in Table |I] - |IV] The framework then 
executed Algorithm to generate an execution plan. The plan 
was then executed and the resulting cost and performance were 
found to satisfy the constraints. 

Instead of demonstrating the results of overall cost and 
execution time, we compared two approaches (one using single 
instance types and the second using multiple instance types) 
by taking the ratio of their results. When the budget (or 
deadline) constraint is given, the ratio between execution times 
(or actual costs) of using only the most cost-effective instance 
type and using the combination of different ones is noted. Both 
approaches have the same performance if the ratio was equal to 
1 and when multiple instance types are used the performance 
is better if the ratio was greater than 1. Single instance types 
perform better when the ratio was less than 1. 

Figure [T] presents the results for Scenarios 1, 2 and 3. In 
scenario 1 and 2, two approaches behaved similarly most of the 
time. It is because they both used VMs of the cheapest instance 
type, which had either the same (scenario 1) or more (scenario 
2 ) performance per cost in comparison with the remaining 
instance type. Hence, the remaining budget was not enough 
to add any more VMs of other instance types. The only time 
when the second approach out-performed the first was when 
deadline was 1800 seconds in Scenario 1 (Figure [g. It can 
be explained by looking into the total number of VMs used 
by each approach; if only one instance type was used, an 
execution plan contained 19 VMs of iti, whose cost was 1. 
However, Algorithm [T] replaced 17 VMs of type iti with one 
VM of it^, whose cost was 16. As the result, the cost could 
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Fig. 1: Comparison between two approaches for Scenarios 1, 2 and 3 


be reduced, however, the execution time increased from 1706 
to 1790 which was still below the deadline constraint. 

On the other hand, in Scenario 3 in which the trade-off 
was cost-effective, the second approach out-performed the 
first approach most of the time (black and squared lines in 
Figure [TJ. It can be explained as follow; due to the cost- 
effectiveness, the most expensive instance type was selected 
by both Equations 17 and 20 While the first approach only 
tried to create as many VMs of the selected instance type as 
possible, the second approach took advantage of the remaining 
budget by either adding or replacing the existing VMs with 
VMs of other types. 

Figure [^presents the result for the mixed trade-off scenario. 
It is similar to Scenario 1 or 2 in that there is no significant 
difference between using only one or a combination of dif¬ 
ferent instance types. This can be explained as follows: two 
most cost-effective instance types are C3.Large and C3.Xlarge, 
which is understandable since they both were CPU optimised 
instance type and the genome search application is CPU inten¬ 
sive. As a result, either instance was always selected by Equa¬ 
tions 17 and 20 Since Algorithm [T] optimises the execution, 
it never selects instance types M3 .Large or M3.Xlarge since 
they are more expensive and has poorer performance when 
compared to C3.Large and C3.Xlarge. C3.Large and C3.Xlarge 
demonstrate fair trade-off {tocz. Large,cz.xiarge = 1-002); 
consequentially. Scenario 4 is similar to Scenario 1. Even 
though VMs of M3.Medium type could be added due to the 
remaining budget, there was no significant improvement in 
performance since the instance performs poorly. 

The experiments show that the hard constraints can be 


satisfied by the heuristic algorithm proposed in this paper. The 
results from the experiments show that single instance types 
can sometimes be as effective as multiple instance types. As 
a general trend, we note that when there is a cost effective 
trade-off multiple instance types perform better than a single 
type of instance. 


VI. Related Work 

There is extensive research focusing on executing BoT in 
the Grid environment. The MyGrid framework IJl supports the 
execution of BoT on the distributed environment and improves 
performance by replicating tasks. Similarly, algorithms to 
assign a collection of tasks to grid resources in order to 
minimise execution time has been developed 0. Scheduling 
tasks based on deadline constraints in the grid 0 and based 
on the location of input for each task m are considered. The 
execution of independent but file-sharing tasks is investigated 
and a heuristic algorithm to achieve better performance in 
comparison to greedy approaches is proposed Q. Scheduling 
tasks while satisfying both deadline and budget constraints are 
considered assuming that each task requires distributed data 
at multiple sources 0. Scheduling multiple BoTs is another 
avenue that is investigated 0, Col. Executing parallel tasks 
on a single machine is presented as an alternative approach to 
improve performance 02. 

Eor clusters, Hadoop’s YARN ifT^ and Mesos lfT3l are 
resource management frameworks that allocate compute re¬ 
sources to applications in order to maximise performance. 
Task scheduling systems such as Apollo 01 and Omega 03 
predict the execution time of each task on a resource and use 












g « 

f5 

CC 

I - 

"3 

o 

0) 

X ■>- 

HI ^ 


cn 

d 


-A- Scenario4 


• A — A A 

A- A- A-' A- A- A- A' ^ 


1.0 






1.2 


1.4 1.6 

Budget 


1.8 


2.0 


(D 

01 CM 


O 

O 


cn 

d 


-A- Scenario4 


' 'A- -A- -A— -A- -A- -A- -A- -A- -A 


T 


T 


2000 


2500 3000 

Deadiine 


3500 


(a) Results for budget constraint 


(b) Results for deadline constraint 


Fig. 2; Comparison between two approaches for Scenario 4 


this prediction for assigning tasks. Sparrow m performs task 
assignment by evaluating the length of a waiting queue of tasks 
to be executed on a resource. 

The cost factor is usually not taken into account while 
scheduling tasks on Grid and Cluster environments. On the 
other hand, in cloud computing, a user has to pay for the 
resources employed, and therefore, the monetary cost must be 
considered. Numerous attempts have been made to model costs 
related to executing BoTs on different platforms. However, 
these models either take into account the cost for transferring 
data and execution of individual tasks 0 or are auction-based 
models which may not be most suited for the cloud ini. 

Recently, many researchers have started to apply cloud 
computing for executing BoT applications. Statistical learning 
and constraint solvers are used to maximise the execution 
performance while satisfying a budget constraint ifTSll . Dead¬ 
line constraints have also been investigated by considering 
the workload of VMs HD. Methods for optimising the cost 
and performance on multiple clouds ll^ and scheduling 
algorithms are considered im. In our previous work, we 
investigated the trade-off between performance and cost when 
executing Bag-of-Distributed-Tasks on the cloud and proposed 
a method to find an execution plan based on a given budget 
constraint ca. 

The research presented in this paper distinguishes itself 
from the current state-of-the-art in many ways. First of all, 
it does not put a limit on the number of cloud resources; most 
research assume a limit, for example, im, Eo), ED, ES- 
The resource limit is defined by either budget or deadline 
constraints assuming the availability of unlimited number 


of resources. Moreover, our approach focuses not only on 
resource provisioning, but also task scheduling; this is often 
not considered in other research, for example, lITSll . lfT9l . Task 
scheduling offers the flexibility of controlling the execution of 
BoT on the cloud. 

VII. Conclusion 

In comparison to other distributed environments, such as the 
grid and the cluster, scheduling tasks on the cloud is complex 
since (i) there are multiple types of resources with different 
performance and varying costs offered on the cloud, and (ii) 
a user can impose a budget or a deadline constraint. It is 
challenging to make a decision on the type or combination 
of resources that can satisfy the constraints. 

To address the above challenge, in this paper, we proposed a 
heuristic algorithm to schedule tasks or Bag-of-Tasks (BoT) on 
the cloud such that the hard constraints imposed by a user can 
be satisfied. The algorithm first generates an execution plan 
comprising the most cost effective resource and then modifies 
the initial plan with different resource types. We evaluated the 
algorithm on four scenarios which were developed by taking 
into account the trade-off between performance and cost of 
the cloud resources. If the trade-off was either fair or cost- 
ineffective, there was not much difference between using a 
single or multiple types of resources. However, if the trade¬ 
off was cost-effective, a combination of different resources was 
able to reduce the cost and/or increase the performance. The 
experiments confirms that the budget and deadline constraints 
can be satisfied. 

In the future, we plan to generalise our approach so that it 









can be applied for multiple applications. Moreover, dynamic 
scheduling and resource provisioning will be investigated so 
that the failure of a virtual machine can be handled. 
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